Truncating Temporal Differences: On the Efficient Implementation of TD(lambda) for Reinforcement Learning

نویسنده

Pawel Cichosz

چکیده

Temporal diierence (TD) methods constitute a class of methods for learning predictions in multi-step prediction problems, parameterized by a recency factor. Currently the most important application of these methods is to temporal credit assignment in reinforcement learning. Well known reinforcement learning algorithms, such as AHC or Q-learning, may be viewed as instances of TD learning. This paper examines the issues of the eecient and general implementation of TD() for arbitrary , for use with reinforcement learning algorithms optimizing the discounted sum of rewards. The traditional approach, based on eligibility traces, is argued to suuer from both ineeciency and lack of generality. The TTD (Truncated Temporal Diierences) procedure is proposed as an alternative, that indeed only approximates TD(), but requires very little computation per action and can be used with arbitrary function representation methods. The idea from which it is derived is fairly simple and not new, but probably unexplored so far. Encouraging experimental results are presented, suggesting that using > 0 with the TTD procedure allows one to obtain a signiicant learning speedup at essentially the same cost as usual TD(0) learning.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Temporal Difference Bayesian Model Averaging: A Bayesian Perspective on Adapting Lambda

Temporal difference (TD) algorithms are attractive for reinforcement learning due to their ease-of-implementation and use of “bootstrapped” return estimates to make efficient use of sampled data. In particular, TD(λ) methods comprise a family of reinforcement learning algorithms that often yield fast convergence by averaging multiple estimators of the expected return. However, TD(λ) chooses a v...

متن کامل

True Online Emphatic TD($\lambda$): Quick Reference and Implementation Guide

This document is a guide to the implementation of true online emphatic TD(λ), a model-free temporal-difference algorithm for learning to make long-term predictions which combines the emphasis idea (Sutton, Mahmood & White 2015) and the true-online idea (van Seijen & Sutton 2014). The setting used here includes linear function approximation, the possibility of off-policy training, and all the ge...

متن کامل

An Empirical Evaluation of True Online TD({\lambda})

The true online TD(λ) algorithm has recently been proposed (van Seijen and Sutton, 2014) as a universal replacement for the popular TD(λ) algorithm, in temporal-difference learning and reinforcement learning. True online TD(λ) has better theoretical properties than conventional TD(λ), and the expectation is that it also results in faster learning. In this paper, we put this hypothesis to the te...

متن کامل

The Analysis of Experimental Results of Machine Learning Approach

In this article is analyzed a reinforcement learning method, in which is defined a subject of learning. The essence of this method is the selection of activities by a try and fail process and awarding deferred rewards. If an environment is characterized by the Markov property, then step-by-step dynamics will enable forecasting of subsequent conditions and awarding subsequent rewards on the basi...

متن کامل

Double Q($\sigma$) and Q($\sigma, \lambda$): Unifying Reinforcement Learning Control Algorithms

Temporal-difference (TD) learning is an important field in reinforcement learning. Sarsa and Q-Learning are among the most used TD algorithms. The Q(σ) algorithm (Sutton and Barto (2017)) unifies both. This paper extends the Q(σ) algorithm to an online multi-step algorithm Q(σ, λ) using eligibility traces and introduces Double Q(σ) as the extension of Q(σ) to double learning. Experiments sugges...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

J. Artif. Intell. Res.

دوره 2 شماره

صفحات -

تاریخ انتشار 1995

Truncating Temporal Differences: On the Efficient Implementation of TD(lambda) for Reinforcement Learning

نویسنده

چکیده

منابع مشابه

Temporal Difference Bayesian Model Averaging: A Bayesian Perspective on Adapting Lambda

True Online Emphatic TD($\lambda$): Quick Reference and Implementation Guide

An Empirical Evaluation of True Online TD({\lambda})

The Analysis of Experimental Results of Machine Learning Approach

Double Q($\sigma$) and Q($\sigma, \lambda$): Unifying Reinforcement Learning Control Algorithms

عنوان ژورنال:

اشتراک گذاری